13/01/2021

Schedule

  1. Working in different contexts: RStudio Projects
  2. Dynamic document generation: RMarkdown
  3. Version control: Git + GitHub
  4. Package management: renv
  5. Containerization: Docker
  6. Where should I start?
  7. Start collaborating

0. Kudos

0. Collaborating with yourself and others

1. Working in different contexts: RStudio Projects - What & Why?

  • What it does:
    • Allows to work in multiple different contexts (projects), e.g. one for each experiment
    • Each project is own working directory, workspace, history, and source documents
    • Each project is associated with a folder on your computer (= working directory)
  • Why it helps:
    • Have a separate, shareable working environment for each experiment
    • Keep all the files associated with a project together — data, scripts, results, figures
    • Work on multiple projects at once, each associated with its packages (and package versions), loaded data, etc.
    • Use only relative paths
    • Necessary basis for version control

1. Working in different contexts: RStudio Projects – How?

  • In RStudio: File > New Project > …

1. Working in different contexts: RStudio Projects – Version 1: Create new project

1. Working in different contexts: RStudio Projects – Version 1: Create new project

1. Working in different contexts: RStudio Projects – Version 1: Create new project

1. Working in different contexts: RStudio Projects – Version 2: Create from existing directory

1. Working in different contexts: RStudio Projects – Version 3: Create from version control (Git)

1. Working in different contexts: RStudio Projects – Version 3: Create from version control (Git)

1. Working in different contexts: RStudio Projects – Open and manage projects

1. Working in different contexts: RStudio Projects – Open and manage projects

1. Working in different contexts: RStudio Projects – Tricks and troubleshooting

  • Relative paths: path separator characters vary across systems & anchor points differ depending on contexts
    • Use the here-package (Müller, 2020) to define relative paths within the project: read.csv(here::here("data", "file_I_want.csv"))

2. Dynamic document generation: RMarkdown - What & Why?

  • What it does:
    • Creates dynamic documents with embedded chunks of code (R, python, Julia, stan, …), computed results , written text etc. (= LaTeX)
    • Markdown-files can be exported to documents (docx, rtf), presentations, pdfs, websites (html), … e.g using the knitr (Xie, 2015, 2020) and tinytex (Xie, 2015, 2020; for pdfs)
    • R code is dynamically rendered, and can be given in separate chunks (’’‘{r}’’‘) or inline (’ r … ’)
  • Why it helps:
    • Simple language (\(\neq\) LaTeX)
    • Integrates directly with statistical software (R Studio)
    • Saves code AND output in one file
    • Reduces copy&paste errors: reported results consistent with actual results

2. Dynamic document generation: RMarkdown - How?

  • Installation: install.packages("rmarkdown") (Allaire et al., 2017)
  • Install ‘knitr’ package for easy access: install.packages("knitr") (Xie 2015, 2020)

2. Dynamic document generation: RMarkdown - How?

  • Installation: install.packages("rmarkdown") (Allaire et al., 2017)
  • Install ‘knitr’ package for easy access: install.packages("knitr") (Xie 2015, 2020)
  • Open a markdown file (.Rmd): File > New File > R Markdown

2. Dynamic document generation: RMarkdown - How?

  • Installation: install.packages("rmarkdown")
  • Open a markdown file: File > New File > R Markdown

2. Dynamic document generation: RMarkdown - Tricks & troubleshooting

  • You don’t have RStudio installed: install Pandoc (http://pandoc.org) before installing markdown ()
  • Lengthy R code chunks: Install knitr-package (Xie, 2014, 2015, 2020) to customize chunks and knitting process
    • {r cache=TRUE,message=FALSE,warning=FALSE,results="hide", error = TRUE}
    • or use opts_chunk$set()-function
  • Knit to pdf: You need a LaTeX-installation
    • TinyTeX (Xie, 2010) is a light-weight, cross-platform distribution (install.packages("tinytex"); tinytex::install_tinytex()))
    • Separate code chunks by a blank line
  • Knit older .R code files: Put #’ in front of any top-level prose, including the header, or use:
#/*
rmarkdown::render(input = rstudioapi::getSourceEditorContext()$path,
                  output_format = rmarkdown::github_document()),
                  knit_root_dir = getwd()) #*/

3. Version control: Git + GitHub - What & Why?

3. Version control: Git + GitHub - What & Why?

  • What it does:
    • Tracks changes to files (data and code) over time: Sequence of “snapshots” (commits)
    • Allows to “go back in time”: Recall older versions or to revert the entire project
    • Changes between commits can be compared
    • Organized in repositories: Collection of all snapshots
    • GitHub: Popular server for sharing materials (privately or publicly) and collaborating via git (also: GitLab and others)

3. Version control: Git + GitHub - What & Why?

  • Why it helps:
    • Keep things organized and track changes
    • Clean up code
    • Language agnostic
    • (Remote) backup
    • Work together, with collaborators (even simultaneously and parallel: branches, merges, pull requests) - and your future self
    • Web interface for your project and to track issues
    • Easily connected e.g. to osf.io

3. Version control: Git + GitHub – Installation

  • Register an account with GitHub: https://github.com/
  • (Update R, RStudio, and your packages: update.packages(ask = FALSE, checkBuilt = TRUE))
  • Is Git installed? Open your shell (“Terminal” in RStudio or on Mac, “Eingabeaufforderung” on Windows), and type: git --version. If “git: command not found”:
  • Install Git - Mac: Mac offers to install developer command line developer tools automatically. Click “Install”. If you don’t get the offer, type: xcode-select --install. Restart R.
  • Install Git - Windows: Install “Git Bash” (https://gitforwindows.org). Accept default settings. When asked about “Adjusting your PATH environment”, select “Git from the command line and also from 3-rd party software”. Restart R.
  • Configure Git: In the (Git Bash) shell, type
    • git config --global user.name 'your name'
    • git config --global user.email 'email associated with your GitHub account'
    • git config --global --list (Check whether everything worked)
  • Optional: Install a Git client. Find more info e.g. here: https://happygitwithr.com/git-client.html

3. Version control: Git + GitHub – Vocabulary

  • Vocabulary - Git:
    • Repo(sitory): Directory of files that Git manages holistically
    • Commit: Snapshot of all files in the repository, at a specific moment, each with a unique identifier (hash code or SHA) and description (commit message)
    • Diff: Set of differences between (any) two commits
    • Tag: Specific name for a certain snapshot (optional), e.g. “v1.0.3”, “preprint”, “submitted”

3. Version control: Git + GitHub – Vocabulary

  • Vocabulary - Git:
    • Repo(sitory): Directory of files that Git manages holistically
    • Commit: Snapshot of all files in the repository, at a specific moment, each with a unique identifier (hash code or SHA) and description (commit message)
    • Diff: Set of differences between (any) two commits
    • Tag: Specific name for a certain snapshot (optional), e.g. “v1.0.3”, “preprint”, “submitted”
  • Vocabulary - GitHub
    • Push: Send your local Git commits to GitHub
    • Pull: Compare and update your local Git with GitHub
    • Merge conflict: Git can’t be certain how to jointly apply diffs from two commits to their common parent. Resolve by picking manually, avoid by pushing often.

3. Version control: Git + GitHub – Code along

  • Go to https://github.com/ and log in
  • Click “New repository”
    • Decide between “private” or “public”. Initialize with a README. Accept default for everything else.
    • Click “Create repository”
    • Copy the HTTPS

3. Version control: Git + GitHub – Code along

  • Clone your GitHub repo to your computer: Type git clone https://github.com/YOUR-USERNAME/YOUR-REPOSITORY.git (your link) in the shell (Terminal in RStudio or on Mac, Git Bash on Windows)
    • Make this repo your working directory (cd YOUR-REPOSITORY), list its files (ls), display README (head README.md), get info on its connection to GitHub (git remote show origin)
    • Make a local change: Add a new line to README, and verify change is noticed:
echo "This is the first change to my repo" >> README.md
git status
  • Stage (add), commit (commit - m "YOUR-COMMIT-MESSAGE), and push change. You may be asked for your username and password.
git add -A
git commit -m "A commit from my local computer"
git push
- (Clean up: Delete your local repo (`cd`, and then `rm -rf YOUR-REPO-NAME/`))

3. Version control: Git + GitHub - Tricks & Troubleshooting

  • GitHub: No long-term guarantee for availability of service (is commercial)
    • Mirror snapshots on hu servers/osf/zenodo/FigShare/…
  • GitHub: .md-files will be displayed like HTML, CSV will have a nice layout, README.md-files act like the landing page. Use internal link to other files.

4. Package management: renv

4. renv – What & Why?

  • What it does:
    • Creates a project-specific library of packages in the project folder
    • Overwrites install.packages() to install packages in this local library
    • Keeps track of package versions in the renv.lock file


  • Why it helps:
    • Keeps package versions untouched by other projects
    • Allows you to revert to the previous state when an update has broken your analysis
    • Makes it easier to share package versions with your collaborators (e.g., via GitHub)

4. renv – How?

  • Install renv just like any other R package via install.packages(renv)
  • Initialize your project library via renv::init()
  • Instead, you can also select “Use renv with this project” during project creation
  • After successfully installing or updating packages, use renv::snapshot()
  • If you want to revert to previous state (e.g., if an update caused problems), use renv::restore()

4. renv – How?

4. renv – Code along

  • Initialize renv for your sleepstudy project using renv::init()
  • From the Files pane in RStudio, take a look at the renv.lock file
  • Install a new package:
install.packages("cowsay")
  • Acutally use the package in one of your scripts:
cowsay::say("Hello world", "cow")
  • Write this change to the lockfile using renv::snapshot()
  • Commit and push your changes to GitHub

4. renv – How?

Restoring someone else’s package versions:

  1. Clone or pull the repository from GitHub
  2. Open the the RStudio project (e.g. via the projectname.Rproj file)
  3. Use renv::restore() to install the package versions from the renv.lock file

4. renv – Troubleshooting

  • There might be some (inconsequential) warnings when switching between Mac and Windows
  • At least on Windows, you need to have Rtools installed when installing packages that are not on CRAN (https://cran.r-project.org/bin/windows/Rtools/)
  • Installing and loading packages may take a while, especially if your project lives on a network drive
    (such as N:/)

5. Containerization: Docker

5. Docker – What & Why?

  • What it does:
    • Creates a small, linux-based virtual machine on your computer
    • Makes it possible to run your scripts (or render your .Rmd files) on this virtual system
    • The recipe to build this system is stored in a Dockerfile that can be shared via GitHub


  • Why it helps:
    • Ensures long-term reproducibility regardless
    • Prevents differences between operating systems, base R versions, languages etc.

5. Docker – How?

docker run -d  -e PASSWORD=1234 -p 8787:8787 -v /path/to/your/project:/home/rstudio/ rocker/rstudio
  • You can then access RStudio (running in the container) by opening http://localhost:8787 in your web browser (username: rstudio, password: 1234)
  • You can also build your own container by:
    • Choosing a base image from https://hub.docker.com/u/rocker (e.g., one including the tidyverse or LaTeX)
    • Creating a Dockerfile in your project directy, specyfing additional steps to execute when building the container, e.g., install.packages("renv"); renv::restore()

  • For detailled instructions, see Peikert & Brandmaier (2020)

5. Docker – How?

  • Some additional tools based on Docker:
    • With binder (https://mybinder.org) and Code Ocean (https://codeocean.com), you can run your analysis in the cloud; they will even create the Dockerfile for you if you don’t have your own one
    • Singularity (https://sylabs.io) is a fully compatible, open source clone of Docker which you can use on systems where you don’t have root access (e.g., on high performance clusters)

6. Where should I start?

  • Suggested order of steps, minimal and maximal version

7. Start colaborating